Goto

Collaborating Authors

 bottleneck size


61c00c07e6d27285e4b952e96cc65666-Paper-Conference.pdf

Neural Information Processing Systems

However, in practice, new reconstruction methods could improve performance for at least three other reasons: learning more about the distribution of stimuli, becoming better at reconstructing text or images in general, or exploiting weaknesses in current image and/or text evaluation metrics. Here we disentangle how much of the reconstruction is due to these other factors vs. productively using the neural recordings.



BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?

Mayo, David, Wang, Christopher, Harbin, Asa, Alabdulkareem, Abdulrahman, Shaw, Albert Eaton, Katz, Boris, Barbu, Andrei

arXiv.org Artificial Intelligence

When evaluating stimuli reconstruction results it is tempting to assume that higher fidelity text and image generation is due to an improved understanding of the brain or more powerful signal extraction from neural recordings. However, in practice, new reconstruction methods could improve performance for at least three other reasons: learning more about the distribution of stimuli, becoming better at reconstructing text or images in general, or exploiting weaknesses in current image and/or text evaluation metrics. Here we disentangle how much of the reconstruction is due to these other factors vs. productively using the neural recordings. We introduce BrainBits, a method that uses a bottleneck to quantify the amount of signal extracted from neural recordings that is actually necessary to reproduce a method's reconstruction fidelity. We find that it takes surprisingly little information from the brain to produce reconstructions with high fidelity. In these cases, it is clear that the priors of the methods' generative models are so powerful that the outputs they produce extrapolate far beyond the neural signal they decode. Given that reconstructing stimuli can be improved independently by either improving signal extraction from the brain or by building more powerful generative models, improving the latter may fool us into thinking we are improving the former. We propose that methods should report a method-specific random baseline, a reconstruction ceiling, and a curve of performance as a function of bottleneck size, with the ultimate goal of using more of the neural recordings.


Understanding Variational Autoencoders with Intrinsic Dimension and Information Imbalance

Camboulin, Charles, Doimo, Diego, Glielmo, Aldo

arXiv.org Machine Learning

This work presents an analysis of the hidden representations of Variational Autoencoders (VAEs) using the Intrinsic Dimension (ID) and the Information Imbalance (II). We show that VAEs undergo a transition in behaviour once the bottleneck size is larger than the ID of the data, manifesting in a double hunchback ID profile and a qualitative shift in information processing as captured by the II. Our results also highlight two distinct training phases for architectures with sufficiently large bottleneck sizes, consisting of a rapid fit and a slower generalisation, as assessed by a differentiated behaviour of ID, II, and KL loss. These insights demonstrate that II and ID could be valuable tools for aiding architecture search, for diagnosing underfitting in VAEs, and, more broadly, they contribute to advancing a unified understanding of deep generative models through geometric analysis.


Kolmogorov-Arnold Network Autoencoders

Moradi, Mohammadamin, Panahi, Shirin, Bollt, Erik, Lai, Ying-Cheng

arXiv.org Artificial Intelligence

Deep learning models have revolutionized various domains, with Multi-Layer Perceptrons (MLPs) being a cornerstone for tasks like data regression and image classification. However, a recent study has introduced Kolmogorov-Arnold Networks (KANs) as promising alternatives to MLPs, leveraging activation functions placed on edges rather than nodes. This structural shift aligns KANs closely with the Kolmogorov-Arnold representation theorem, potentially enhancing both model accuracy and interpretability. In this study, we explore the efficacy of KANs in the context of data representation via autoencoders, comparing their performance with traditional Convolutional Neural Networks (CNNs) on the MNIST, SVHN, and CIFAR-10 datasets. Our results demonstrate that KAN-based autoencoders achieve competitive performance in terms of reconstruction accuracy, thereby suggesting their viability as effective tools in data analysis tasks.


Associative Transformer

Sun, Yuwei, Ochiai, Hideya, Wu, Zhirong, Lin, Stephen, Kanai, Ryota

arXiv.org Artificial Intelligence

Sparse knowledge association can find resonance with the neuroscientific grounding of the Global Emerging from the pairwise attention in conventional Workspace Theory (GWT) (Baars, 1988; Dehaene et al., Transformers, there is a growing interest 1998; VanRullen & Kanai, 2020; Juliani et al., 2022). GWT in sparse attention mechanisms that align explains a fundamental cognitive architecture for working more closely with localized, contextual learning memory in the brain where diverse specialized modules compete in the biological brain. Existing studies such as to write information into a shared workspace through the Coordination method employ iterative crossattention a communication bottleneck. The bottleneck facilitates the mechanisms with a bottleneck to enable processing of content-addressable information using attention the sparse association of inputs. However, guided by contents in the shared workspace (Awh et al., these methods are parameter inefficient and fail 2006; Gazzaley & Nobre, 2012). in more complex relational reasoning tasks. To this end, we propose Associative Transformer A bottleneck guides models to generalize in a manner consistent (AiT) to enhance the association among sparsely with the underlying data distribution through inductive attended input patches, improving parameter efficiency biases of sparsity (Baxter, 2000; Goyal & Bengio, 2022), and performance in relational reasoning resulting in superior performance in tasks such as relational tasks.


Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving

Echterhoff, Jessica, Yan, An, Han, Kyungtae, Abdelraouf, Amr, Gupta, Rohit, McAuley, Julian

arXiv.org Artificial Intelligence

Concept bottleneck models have been successfully used for explainable machine learning by encoding information within the model with a set of human-defined concepts. In the context of human-assisted or autonomous driving, explainability models can help user acceptance and understanding of decisions made by the autonomous vehicle, which can be used to rationalize and explain driver or vehicle behavior. We propose a new approach using concept bottlenecks as visual features for control command predictions and explanations of user and vehicle behavior. We learn a human-understandable concept layer that we use to explain sequential driving scenes while learning vehicle control commands. This approach can then be used to determine whether a change in a preferred gap or steering commands from a human (or autonomous vehicle) is led by an external stimulus or change in preferences. We achieve competitive performance to latent visual features while gaining interpretability within our model setup.


AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual Voice Conversion

Choi, Haeyun, Gim, Jio, Lee, Yuho, Kim, Youngin, Suh, Young-Joo

arXiv.org Artificial Intelligence

This paper proposes a simple and robust zero-shot voice conversion system with a cycle structure and mel-spectrogram pre-processing. Previous works suffer from information loss and poor synthesis quality due to their reliance on a carefully designed bottleneck structure. Moreover, models relying solely on self-reconstruction loss struggled with reproducing different speakers' voices. To address these issues, we suggested a cycle-consistency loss that considers conversion back and forth between target and source speakers. Additionally, stacked random-shuffled mel-spectrograms and a label smoothing method are utilized during speaker encoder training to extract a time-independent global speaker representation from speech, which is the key to a zero-shot conversion. Our model outperforms existing state-of-the-art results in both subjective and objective evaluations. Furthermore, it facilitates cross-lingual voice conversions and enhances the quality of synthesized speech.


Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Su, Tung-Cheng, Chang, Yung-Chuan, Liu, Yi-Wen

arXiv.org Artificial Intelligence

Singing technique conversion (STC) refers to the task of converting from one voice technique to another while leaving the original singer identity, melody, and linguistic components intact. Previous STC studies, as well as singing voice conversion research in general, have utilized convolutional autoencoders (CAEs) for conversion, but how the bottleneck width of the CAE affects the synthesis quality has not been thoroughly evaluated. To this end, we constructed a GAN-based multi-domain STC system which took advantage of the WORLD vocoder representation and the CAE architecture. We varied the bottleneck width of the CAE, and evaluated the conversion results subjectively. The model was trained on a Mandarin dataset which features four singers and four singing techniques: the chest voice, the falsetto, the raspy voice, and the whistle voice. The results show that a wider bottleneck corresponds to better articulation clarity but does not necessarily lead to higher likeness to the target technique. Among the four techniques, we also found that the whistle voice is the easiest target for conversion, while the other three techniques as a source produce more convincing conversion results than the whistle.


Training Invertible Neural Networks as Autoencoders

Nguyen, The-Gia Leo, Ardizzone, Lynton, Köthe, Ullrich

arXiv.org Artificial Intelligence

Autoencoders are able to learn useful data representations in an unsupervised matter and have been widely used in various machine learning and computer vision tasks. In this work, we present methods to train Invertible Neural Networks (INNs) as (variational) autoencoders which we call INN (variational) autoencoders. Our experiments on MNIST, CIFAR and CelebA show that for low bottleneck sizes our INN autoencoder achieves results similar to the classical autoencoder. However, for large bottleneck sizes our INN autoencoder outperforms its classical counterpart. Based on the empirical results, we hypothesize that INN autoencoders might not have any intrinsic information loss and thereby are not bounded to a maximal number of layers (depth) after which only suboptimal results can be achieved.